22 research outputs found

    Author Profiling in Social Media: Age, Gender and Language Variety Identification

    Get PDF
    Tesis doctoral escrita por Francisco Manuel Rangel Pardo en la Universitat Politècnica de València, bajo la dirección del PhD. Paolo Rosso. La tesis fue defendida el 3 de junio de 2016 en la misma universidad, ante el tribunal compuesto por los doctores Núria Bel de la Universitat Pompeu Fabra, Raquel Martínez Unanue de la Universidad Nacional de Educación a Distancia (UNED) y Rafael Berlanga Llavorí de la Universitat Jaume I. La tesis fue calificada con la puntuación de Sobresaliente Cum Laude.PhD thesis written by Francisco Manuel Rangel Pardo at the Universitat Politècnica de València, under the supervision of PhD. Paolo Rosso. The thesis was defended on June 3rd 2016, with the commitee formed by the doctors Núria Bel from Universitat Pompeu Fabra, Raquel Martínez Unanue from Universidad Nacional de Educación a Distancia (UNED) and Rafael Berlanga Llavorí from Universitat Jaume I. The thesis was graded with Excellent Cum Laude.Este trabajo ha sido parcialmente financiado por Autoritas Consulting SA (http://www.autoritas.net)

    Author Profiling in Social Media: The Impact of Emotions on Discourse Analysis

    Full text link
    [EN] In this paper we summarise the content of the keynote that will be given at the 5th International Conference on Statistical Language and Speech Processing (SLSP) in Le Mans, France in October 23¿25, 2017. In the keynote we will address the importance of inferring demographic information for marketing and security reasons. The aim is to model how language is shared in gender and age groups taking into account its statistical usage. We will see how a shallow discourse analysis can be done on the basis of a graph-based representation in order to extract information such as how complicated the discourse is (i.e., how connected the graph is), how much interconnected grammatical categories are, how far a grammatical category is from others, how different grammatical categories are related to each other, how the discourse is modelled in different structural or stylistic units, what are the grammatical categories with the most central use in the discourse of a demographic group, what are the most common connectors in the linguistic structures used, etc. Moreover, we will see also the importance to consider emotions in the shallow discourse analysis and the impact that this has. We carried out some experiments for identifying gender and age, both in Spanish and in English, using PAN-AP-13 and PAN-PC-14 corpora, obtaining comparable results to the best performing systems of the PAN Lab at CLEF.The research work described in this paper was partially carried out in the framework of the SomEMBED project (TIN2015-71147-C2-1-P), funded by the Spanish Ministry of Economy, Industry and Competitiveness (MINECO).Rosso, P.; Rangel-Pardo, FM. (2017). Author Profiling in Social Media: The Impact of Emotions on Discourse Analysis. Lecture Notes in Computer Science. 10583:3-18. https://doi.org/10.1007/978-3-319-68456-7_1S31810583Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech.: Theory Exp. 2008(10), 10008 (2008)Bonacich, P.: Factoring and weighting approaches to clique identification. J. Math. Soc. 2(1), 113–120 (1972)Brandes, U.: A faster algorithm for betweenness centrality. J. Math. Soc. 25(2), 163–177 (2001)Carreras, X., Chao, I., Padró, L., Padró, M.: FreeLing : an open-source suite of language analyzers. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004) (2004)Díaz Rangel, I., Sidorov, G., Suárez-Guerra, S.: Creación y evaluación de un diccionario marcado con emociones y ponderado para el español. Onomazein 29, 23 (2014). (in Spanish)Ekman, P.: Universals and cultural differences in facial expressions of emotion. In: Symposium on Motivation, Nebraska, pp. 207–283 (1972)Forner, P., Navigli, R., Tufis, D. (eds.): CLEF 2013 Evaluation Labs and Workshop, Working Notes Papers, September 2013, Valencia, Spain, vol. 1179, pp. 23–26. CEUR-WS.org (2013)Koppel, M., Argamon, S., Shimoni, A.: Automatically categorizing written texts by author gender. Literay Linguist. Comput. 17(4), 401–412 (2003)Latapy, M.: Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci. (TCS) 407(1–3), 458–473 (2008)Levin, B.: English Verb Classes and Alternations. University of Chicago Press, Chicago (1993)Mann, W.C., Thompson, S.A.: Rhetorical structure theory: toward a functional theory of text organization. Text-Interdiscip. J. Study Discourse 8(3), 243–281 (1988)Meina, M., Brodzinska, K., Celmer, B., Czokow, M., Patera, M., Pezacki, J., Wilk, M.: Ensemble-based classification for author profiling using various features notebook for PAN at CLEF 2013. In: Forner et al. [7]Padró, L., Stanilovsky, E.: FreeLing 3.0: towards wider multilinguality. In: Proceedings of the Language Resources and Evaluation Conference (LREC 2012) (2012)Lopez-Monroy, A.P., Montes-Gomez, M., Jair Escalante, H., Villasenor-Pineda, L., Villatoro-Tello, E.: INAOEs participation at PAN13: author profiling task. Notebook for PAN at CLEF 2013. In: Forner et al. [7]Pennebaker, J.W., Mehl, M.R., Niederhoffer, K.: Psychological aspects of natural language use: our words, our selves. Annu. Rev. Psychol. 54, 547–577 (2003)Pennebaker, J.W.: The Secret Life of Pronouns: What Our Words Say About Us. Bloomsbury Press, London (2011)Rangel, F., Hernández, I., Rosso, P., Reyes, A.: Emotions and irony per gender in Facebook. In: Proceedings of the Workshop on Emotion, Social Signals, Sentiment & Linked Open Data (ES3LOD), LREC-2014, Reykjavik, Iceland, 26–31 May 2014, pp. 68–73 (2014)Rangel, F., Rosso, P., Koppel, M., Stamatatos, E., Inches, G.: Overview of the author profiling task at PAN 2013. In: Forner et al. [7]Rangel, F., Rosso, P., Chugur, I., Potthast, M., Trenkmann, M., Stein, B., Verhoeven, B., Daelemans, W.: Overview of the 2nd author profiling task at PAN 2014. In: Cappellato, L., Ferro, N., Halvey, M., Kraaij, W. (eds.) Notebook Papers of CLEF 2014 LABs and Workshops, vol. 1180, pp. 951–957. CEUR-WS.org (2014)Rangel, F., Rosso, P.: On the multilingual and genre robustness of EmoGraphs for author profiling in social media. In: Mothe, J., Savoy, J., Kamps, J., Pinel-Sauvagnat, K., Jones, G.J.F., SanJuan, E., Cappellato, L., Ferro, N. (eds.) CLEF 2015. LNCS, vol. 9283, pp. 274–280. Springer, Cham (2015). doi: 10.1007/978-3-319-24027-5_28Rangel, F., Rosso, P.: On the impact of emotions on author profiling. Inf. Process. Manag. 52(1), 73–92 (2016)Schler, J., Koppel, M., Argamon, S., Pennebaker, J.W.: Effects of age and gender on blogging. In: AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, AAAI, pp. 199–205 (2006)Soler-Company, J. Wanner, L.: Use of discourse and syntactic features for gender identification. In: The Eighth Starting Artificial Intelligence Research Symposium. Collocated with the 22nd European Conference on Artificial Intelligence, pp. 215–220 (2016)Soler-Company, J., Wanner, L.: On the relevance of syntactic and discourse features for author profiling and identification. In: 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Valencia, Spain, pp. 681–687 (2017)Strapparava, C., Valitutti, A.: WordNet affect: an affective extension of WordNet. In: Proceedings of the 4th International Conference on Language Resources and Evaluation, Lisboa, pp. 1083–1086 (2004)Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’ networks. Nature 393(6684), 409–410 (1998)Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, ICML, pp. 412–420 (1997

    A Low Dimensionality Representation for Language Variety Identification

    Full text link
    [EN] Language variety identification aims at labelling texts in a native language (e.g. Spanish, Portuguese, English) with its specific variation (e.g. Argentina, Chile, Mexico, Peru, Spain; Brazil, Portugal; UK, US). In this work we propose a low dimensionality representation (LDR) to address this task with five different varieties of Spanish: Argentina, Chile, Mexico, Peru and Spain. We compare our LDR method with common state-of-the-art representations and show an increase in accuracy of ~35%. Furthermore, we compare LDR with two reference distributed representation models. Experimental results show competitive performance while dramatically reducing the dimensionality¿and increasing the big data suitability¿to only 6 features per variety. Additionally, we analyse the behaviour of the employed machine learning algorithms and the most discriminating features. Finally, we employ an alternative dataset to test the robustness of our low dimensionality representation with another set of similar languages.The work of the first author was in the framework of ECOPORTUNITY IPT-2012-1220-430000. The work of the last two authors was in the framework of the SomEMBED MINECO TIN2015-71147-C2-1-P research project. This work has been also supported by the SomEMBED TIN2015-71147-C2-1-P MINECO research project and by the Generalitat Valenciana under the grant ALMAPATER (PrometeoII/2014/030).Rangel-Pardo, FM.; Franco-Salvador, M.; Rosso, P. (2018). A Low Dimensionality Representation for Language Variety Identification. Lecture Notes in Computer Science. 9624:156-169. https://doi.org/10.1007/978-3-319-75487-1_13S1561699624Franco-Salvador, M., Rangel, F., Rosso, P., Taulé, M., Antònia Martít, M.: Language variety identification using distributed representations of words and documents. In: Mothe, J., Savoy, J., Kamps, J., Pinel-Sauvagnat, K., Jones, G.J.F., SanJuan, E., Cappellato, L., Ferro, N. (eds.) CLEF 2015. LNCS, vol. 9283, pp. 28–40. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24027-5_3Goodman, J.: Classes for fast maximum entropy training. In: Proceedings of the Acoustics, Speech, and Signal Processing (ICASSP 2001), vol. 1, pp. 561–564 (2001)Gutmann, M.U., Hyvärinen, A.: Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. J. Mach. Learn. Res. 13, 307–361 (2012)Hinton, G.E., Mcclelland, J.L., Rumelhart, D.E.: Distributed Representations, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Foundations, vol. 1. MIT Press, Cambridge (1986)Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning (ICML 2014), vol. 32 (2014)Maier, W., Gómez-Rodríguez, C.: Language variety identification in Spanish tweets. In: Workshop on Language Technology for Closely Related Languages and Language Variants (EMNLP 2014), pp. 25–35 (2014)Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at International Conference on Learning Representations (ICLR 2013) (2013)Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)Mnih, A., Teh, Y.W.: A fast and simple algorithm for training neural probabilistic language models. In: Proceedings of the 29th International Conference on Machine Learning (ICML 2012), pp. 1751–1758 (2012)Sadat, F., Kazemi, F., Farzindar, A.: Automatic identification of Arabic language varieties and dialects in social media. In: 1st International Workshop on Social Media Retrieval and Analysis (SoMeRa 2014) (2014)Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)Tan, L., Zampieri, M., Ljubešic, N., Tiedemann, J.: Merging comparable data sources for the discrimination of similar languages: the DSL corpus collection. In: 7th Workshop on Building and Using Comparable Corpora Building Resources for Machine Translation Research (BUCC 2014), pp. 6–10 (2014)Zampieri, M., Gebrekidan-Gebre, B.: Automatic identification of language varieties: the case of Portuguese. In: Proceedings of the 11th Conference on Natural Language Processing (KONVENS 2012), pp. 233–237 (2012)Zampieri, M., Tan, L., Ljubeši, N., Tiedemann, J.: A report on the DSL shared task 2014. In: Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects (VarDial 2014), pp. 58–67 (2014

    El Impacto del Coronavirus en nuestra Salud Mental

    Get PDF
    The Coronavirus represents the greatest threat to physical health in modern times. Simultaneously, fear of the unknown and the fear of the very real repercussions of the virus is threatening to impact the mental health of many around the world. To provide insights on the impact of Coronavirus on our mental health, we are constantly monitoring millions of conversations on Twitter each day, and analysing this enormous amount of data by means of psychological models trained with artificial intelligence techniques and deep neural networks.El Coronavirus representa la mayor amenaza para la salud física en tiempos modernos. A su vez, el miedo a lo desconocido y a las repercusiones reales del virus, está amenazando con impactar en la salud mental de las personas alrededor de todo el mundo. Para analizar dicho impacto, estamos monitorizando millones de conversaciones en Twitter en tiempo real, y analizando esta gran cantidad de datos mediante modelos psicológicos entrenados con técnicas de inteligencia artificial y redes neuronales profundas

    Uncovering Plagiarism - Author Profiling at PAN

    Full text link
    [ES] PAN is a yearly workshop and evaluation lab on uncovering plagiarism, authorship, and social software misuse. Since 2009, PAN has been organizing benchmark activities on uncovering plagiarism, authorship, and social software misuse . An additional task - author profiling - has also recently been proposed. Author profiling, instead of focusing on individual authors, studies how language is shared by a class of people. Author profiling is a problem of growing importance in applications in forensics, security and marketing. For instance, a person working in the area of forensic linguistics may need to know the linguistic profile of a suspected text message (language used by a certain type of person) and identify characteristics (with language as evidence). Similarly, from a marketing viewpoint, companies may be interested in determining, through the analysis of blogs and online product reviews, what types of people like or dislike their products.Rosso, P.; Rangel Pardo, FM. (2014). Uncovering Plagiarism - Author Profiling at PAN. Ercim News. (96):49-49. http://hdl.handle.net/10251/49303S49499

    Overview of the PAN'2016 - New Challenges for Authorship Analysis: Cross-genre Profiling, Clustering, Diarization, and Obfuscation

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-44564-9_28This paper presents an overview of the PAN/CLEF evaluation lab. During the last decade, PAN has been established as the main forum of digital text forensic research. PAN 2016 comprises three shared tasks: (i) author identification, addressing author clustering and diarization (or intrinsic plagiarism detection); (ii) author profiling, addressing age and gender prediction from a cross-genre perspective; and (iii) author obfuscation, addressing author masking and obfuscation evaluation. In total, 35 teams participated in all three shared tasks of PAN 2016 and, following the practice of previous editions, software submissions were required and evaluated within the TIRA experimentation framework.The work of the first author was partially supported by the Som EMBED TIN2015-71147-C2-1-P MINECO research project and by the Generalitat Valenciana under the grant ALMA MATER (Prometeo II/2014/030). The work of the second author was partially supported by Autoritas Consulting and by Ministerio de Economía y Competitividad de España under grant ECOPORTUNITY IPT-2012-1220-430000.Rosso, P.; Rangel-Pardo, FM.; Potthast, M.; Stamatatos, E.; Tschuggnall, M.; Stein, B. (2016). Overview of the PAN'2016 - New Challenges for Authorship Analysis: Cross-genre Profiling, Clustering, Diarization, and Obfuscation. En Experimental IR Meets Multilinguality, Multimodality, and Interaction. Springer Verlag (Germany). 332-350. https://doi.org/10.1007/978-3-319-44564-9_28S332350Almishari, M., Tsudik, G.: Exploring linkability of user reviews. In: Foresti, S., Yung, M., Martinelli, F. (eds.) ESORICS 2012. LNCS, vol. 7459, pp. 307–324. Springer, Heidelberg (2012)Álvarez-Carmona, M.A., López-Monroy, A.P., Montes-Y-Gómez, M., Villaseñor-Pineda, L., Jair-Escalante, H.: INAOE’s Participation at PAN’15: author profiling task–notebook for PAN at CLEF 2015. In: Working Notes Papers of the CLEF 2015 Evaluation Labs. CEUR-WS.org, vol. 1391 (2015)Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retrieval 12(4), 461–486 (2009)Argamon, S., Juola, P.: Overview of the international authorship identification competition at PAN-2011. In: Working Notes Papers of the CLEF 2011 Evaluation Labs (2011)Argamon, S., Koppel, M., Fine, J., Shimoni, A.R.: Gender, genre, and writing style in formal written texts. TEXT 23, 321–346 (2003)Bagnall, D.: Author identification using multi-headed recurrent neural networks. In: Working Notes Papers of the CLEF 2015 Evaluation Labs. CEUR-WS.org, vol. 1391 (2015)Bensalem, I., Boukhalfa, I., Rosso, P., Abouenour, L., Darwish, K., Chikhi, S.: Overview of the AraPlagDet PAN@ FIRE2015 shared task on arabic plagiarism detection. In: Notebook Papers of FIRE 2015. CEUR-WS.org, vol. 1587 (2015)Burger, J.D., Henderson, J., Kim, G., Zarrella, G.: Discriminating gender on twitter. In: Proceedings of EMNLP 2011 (2011)Burrows, S., Potthast, M., Stein, B.: Paraphrase acquisition via crowdsourcing and machine learning. ACM TIST 4(3), 43:1–43:21 (2013)Castillo, E., Cervantes, O., Vilariño, D., Pinto, D., León, S.: Unsupervised method for the authorship identification task. In: CLEF 2014 Labs and Workshops, Notebook Papers. CEUR-WS.org, vol. 1180 (2014)Chaski, C.E.: Who’s at the keyboard: authorship attribution in digital evidence invesigations. Int. J. Digit. Evid. 4, 1–13 (2005)Clarke, C.L., Craswell, N., Soboroff, I., Voorhees, E.M.: Overview of the TREC 2009 web track. In: DTIC Document (2009)Flores, E., Rosso, P., Moreno, L., Villatoro, E.: On the detection of source code re-use. In: ACM FIRE 2014 Post Proceedings of the Forum for Information Retrieval Evaluation, pp. 21–30 (2015)Flores, E., Rosso, P., Villatoro, E., Moreno, L., Alcover, R., Chirivella, V.: PAN@FIRE: overview of CL-SOCO track on the detection of cross-language source code re-use. In: Notebook Papers of FIRE 2015. CEUR-WS.org, vol. 1587 (2015)Fréry, J., Largeron, C., Juganaru-Mathieu, M.: UJM at clef in author identification. In: CLEF 2014 Labs and Workshops, Notebook Papers. CEUR-WS.org, vol. 1180 (2014)Gollub, T., Potthast, M., Beyer, A., Busse, M., Rangel, F., Rosso, P., Stamatatos, E., Stein, B.: Recent trends in digital text forensics and its evaluation. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds.) CLEF 2013. LNCS, vol. 8138, pp. 282–302. Springer, Heidelberg (2013)Gollub, T., Stein, B., Burrows, S.: Ousting Ivory tower research: towards a web framework for providing experiments as a service. In: Proceedings of SIGIR 12. ACM (2012)Hagen, M., Potthast, M., Stein, B.: Source retrieval for plagiarism detection from large web corpora: recent approaches. In: Working Notes Papers of the CLEF 2015 Evaluation Labs. CEUR-WS.org, vol. 1391 (2015)van Halteren, H.: Linguistic profiling for author recognition and verification. In: Proceedings of ACL 2004 (2004)Holmes, J., Meyerhoff, M.: The Handbook of Language and Gender. Blackwell Handbooks in Linguistics, Wiley (2003)Iqbal, F., Binsalleeh, H., Fung, B.C.M., Debbabi, M.: Mining writeprints from anonymous e-mails for forensic investigation. Digit. Investig. 7(1–2), 56–64 (2010)Jankowska, M., Keselj, V., Milios, E.: CNG text classification for authorship profiling task-notebook for PAN at CLEF 2013. In: Working Notes Papers of the CLEF 2013 Evaluation Labs. CEUR-WS.org, vol. 1179 (2013)Juola, P.: An overview of the traditional authorship attribution subtask. In: Working Notes Papers of the CLEF 2012 Evaluation Labs (2012)Juola, P.: Authorship attribution. Found. Trends Inf. Retrieval 1, 234–334 (2008)Juola, P.: How a computer program helped reveal J.K. rowling as author of a Cuckoo’s calling. In: Scientific American (2013)Juola, P., Stamatatos, E.: Overview of the author identification task at PAN-2013. In:Working Notes Papers of the CLEF 2013 Evaluation Labs. CEUR-WS.org vol. 1179 (2013)Keswani, Y., Trivedi, H., Mehta, P., Majumder, P.: Author masking through translation-notebook for PAN at CLEF 2016. In: Conference and Labs of the Evaluation Forum, CLEF (2016)Koppel, M., Argamon, S., Shimoni, A.R.: Automatically categorizing written texts by author gender. Literary Linguist. Comput. 17(4), 401–412 (2002)Koppel, M., Schler, J., Bonchek-Dokow, E.: Measuring differentiability: unmasking pseudonymous authors. J. Mach. Learn. Res. 8, 1261–1276 (2007)Koppel, M., Winter, Y.: Determining if two documents are written by the same author. J. Am. Soc. Inf. Sci. Technol. 65(1), 178–187 (2014)Layton, R., Watters, P., Dazeley, R.: Automated unsupervised authorship analysis using evidence accumulation clustering. Nat. Lang. Eng. 19(1), 95–120 (2013)López-Monroy, A.P., Montes-y Gómez, M., Jair-Escalante, H., Villasenor-Pineda, L.V.: Using intra-profile information for author profiling-notebook for PAN at CLEF 2014. In: Working Notes Papers of the CLEF 2014 Evaluation Labs. CEUR-WS.org, vol. 1180 (2014)López-Monroy, A.P., Montes-y Gómez, M., Jair-Escalante, H., Villasenor-Pineda, L., Villatoro-Tello, E.: INAOE’s participation at PAN’13: author profiling task-notebook for PAN at CLEF 2013. In: Working Notes Papers of the CLEF 2013 Evaluation Labs. CEUR-WS.org, vol. 1179 (2013)Luyckx, K., Daelemans, W.: Authorship attribution and verification with many authors and limited data. In: Proceedings of COLING (2008)Maharjan, S., Shrestha, P., Solorio, T., Hasan, R.: A straightforward author profiling approach in MapReduce. In: Bazzan, A.L.C., Pichara, K. (eds.) IBERAMIA 2014. LNCS, vol. 8864, pp. 95–107. Springer, Heidelberg (2014)Mansoorizadeh, M.: Submission to the author obfuscation task at PAN 2016. In: Conference and Labs of the Evaluation Forum, CLEF (2016)Eissen, S.M., Stein, B.: Intrinsic plagiarism detection. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 565–569. Springer, Heidelberg (2006)Mihaylova, T., Karadjov, G., Nakov, P., Kiprov, Y., Georgiev, G., Koychev, I.: SU@PAN’2016: author obfuscation-notebook for PAN at CLEF 2016. In: Conference and Labs of the Evaluation Forum, CLEF (2016)Miro, X.A., Bozonnet, S., Evans, N., Fredouille, C., Friedland, G., Vinyals, O.: Speaker diarization: a review of recent research. Audio Speech Language Process. IEEE Trans. 20(2), 356–370 (2012)Moreau, E., Jayapal, A., Lynch, G., Vogel, C.: Author verification: basic stacked generalization applied to predictions from a set of heterogeneous learners. In: Working Notes Papers of the CLEF 2015 Evaluation Labs. CEUR-WS.org, vol. 1391 (2015)Nguyen, D., Gravel, R., Trieschnigg, D., Meder, T.: How old do you think I am? a study of language and age in twitter. In: Proceedings of ICWSM 13. AAAI (2013)Peñas, A., Rodrigo, A.: A Simple measure to assess non-response. In: Proceedings of HLT 2011 (2011)Pennebaker, J.W., Mehl, M.R., Niederhoffer, K.G.: Psychological aspects of natural language use: our words, our selves. Ann. Rev. Psychol. 54(1), 547–577 (2003)Potthast, M., Barrón-Cedeño, A., Eiselt, A., Stein, B., Rosso, P.: Overview of the 2nd international competition on plagiarism detection. In: Working Notes Papers of the CLEF 2010 Evaluation Labs (2010)Potthast, M., Barrón-Cedeño, A., Stein, B., Rosso, P.: Cross-language plagiarism detection. Lang. Resour. Eval. (LREC) 45, 45–62 (2011)Potthast, M., Eiselt, A., Barrón-Cedeño, A., Stein, B., Rosso, P.: Overview of the 3rd international competition on plagiarism detection. In: Working Notes Papers of the CLEF 2011 Evaluation Labs (2011)Potthast, M., Gollub, T., Hagen, M., Graßegger, J., Kiesel, J., Michel, M., Oberländer, A., Tippmann, M., Barrón-Cedeño, A., Gupta, P., Rosso, P., Stein, B.: Overview of the 4th international competition on plagiarism detection. In: Working Notes Papers of the CLEF 2012 Evaluation Labs (2012)Potthast, M., Gollub, T., Hagen, M., Tippmann, M., Kiesel, J., Rosso, P., Stamatatos, E., Stein, B.: Overview of the 5th international competition on plagiarism detection. In: Working Notes Papers of the CLEF 2013 Evaluation Labs. CEUR-WS.org, vol. 1179 (2013)Potthast, M., Gollub, T., Rangel, F., Rosso, P., Stamatatos, E., Stein, B.: Improving the reproducibility of PAN’s shared tasks: plagiarism detection, author identification, and author profiling. In: Kanoulas, E., Lupu, M., Clough, P., Sanderson, M., Hall, M., Hanbury, A., Toms, E. (eds.) CLEF 2014. LNCS, vol. 8685, pp. 268–299. Springer, Heidelberg (2014)Potthast, M., Hagen, M., Beyer, A., Busse, M., Tippmann, M., Rosso, P., Stein, B.: Overview of the 6th international competition on plagiarism detection. In: Working Notes Papers of the CLEF 2014 Evaluation Labs. CEUR-WS.org, vol. 1180 (2014)Potthast, M., Hagen, M., Stein, B.: Author obfuscation: attacking the state of the art in authorship verification. In: CLEF 2016 Working Notes. CEUR-WS.org (2016)Potthast, M., Göring, S., Rosso, P., Stein, B.: Towards data submissions for shared tasks: first experiences for the task of text alignment. In: Working Notes Papers of the CLEF 2015 Evaluation Labs. CEUR-WS.org, vol. 1391 (2015)Potthast, M., Hagen, M., Stein, B., Graßegger, J., Michel, M., Tippmann, M., Welsch, C.: ChatNoir: a search engine for the ClueWeb09 corpus. In: Proceedings of SIGIR 12. ACM (2012)Potthast, M., Hagen, M., Völske, M., Stein, B.: Crowdsourcing interaction logs to understand text reuse from the web. In: Proceedings of ACL 13. ACL (2013)Potthast, M., Stein, B., Barrón-Cedeño, A., Rosso, P.: An evaluation framework for plagiarism detection. In: Proceedings of COLING 10. ACL (2010)Potthast, M., Stein, B., Eiselt, A., Barrón-Cedeño, A., Rosso, P.: Overview of the 1st international competition on plagiarism detection. In: Proceedings of PAN at SEPLN 09. CEUR-WS.org 502 (2009)Rangel, F., Rosso, P.: On the impact of emotions on author profiling. Inf. Process. Manage. Spec. Issue Emot. Sentiment Soc. Expressive Media 52(1), 73–92 (2016)Rangel, F., Rosso, P.: On the multilingual and genre robustness of emographs for author profiling in social media. In: Mothe, J., et al. (eds.) CLEF 2015. LNCS, vol. 9283, pp. 274–280. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-24027-5_28Rangel, F., Rosso, P., Celli, F., Potthast, M., Stein, B., Daelemans, W.: Overview of the 3rd author profiling task at PAN 2015. In: Working Notes Papers of the CLEF 2015 Evaluation Labs. CEUR-WS.org, vol. 1391 (2015)Rangel, F., Rosso, P., Chugur, I., Potthast, M., Trenkmann, M., Stein, B., Verhoeven, B., Daelemans, W.: Overview of the 2nd author profiling task at PAN 2014. In: Working Notes Papers of the CLEF 2014 Evaluation Labs. CEUR-WS.org, vol. 1180 (2014)Rangel, F., Rosso, P., Koppel, M., Stamatatos, E., Inches, G.: Overview of the author profiling task at PAN 2013–notebook for PAN at CLEF 2013. In: Working Notes Papers of the CLEF 2013 Evaluation Labs. CEUR-WS.org, vol. 1179 (2013)Rangel, F., Rosso, P., Verhoeven, B., Daelemans, W., Potthast, M., Stein, B.: Overview of the 4th author profiling task at PAN 2016: cross-genre evaluations. In: CLEF 2016 Working Notes. CEUR-WS.org (2016)Samdani, R., Chang, K., Roth, D.: A discriminative latent variable model for online clustering. In: Proceedings of The 31st International Conference on Machine Learning, pp. 1–9 (2014)Sapkota, U., Bethard, S., Montes-y-Gómez, M., Solorio, T.: Not all character N-grams are created equal: a study in authorship attribution. In: Proceedings of NAACL 15. ACL (2015)Sapkota, U., Solorio, T., Montes-y-Gómez, M., Bethard, S., Rosso, P.: Cross-topic authorship attribution: will out-of-topic data help? In: Proceedings of COLING 14 (2014)Schler, J., Koppel, M., Argamon, S., Pennebaker, J.W.: Effects of age and gender on blogging. In: AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs. AAAI (2006)Schwartz, H.A., Eichstaedt, J.C., Kern, M.L., Dziurzynski, L., Ramones, S.M., Agrawal, M., Shah, A., Kosinski, M., Stillwell, D., Seligman, M.E., et al.: Personality, gender, and age in the language of social media: the open-vocabulary approach. PloS One 8(9), 773–791 (2013)Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inf. Sci. Technol. 60, 538–556 (2009)Stamatatos, E.: On the robustness of authorship attribution based on character n-gram features. J. Law Policy 21, 421–439 (2013)Stamatatos, E., Tschuggnall, M., Verhoeven, B., Daelemans, W., Specht, G., Stein, B., Potthast, M.: Clustering by authorship within and across documents. In: CLEF 2016 Working Notes. CEUR-WS.org (2016)Stamatatos, E., Daelemans, W., Verhoeven, B., Juola, P., López-López, A., Potthast, M., Stein, B.: Overview of the author identification task at PAN-2015. In: Working Notes Papers of the CLEF 2015 Evaluation Labs. CEUR-WS.org, vol. 1391 (2015)Stamatatos, E., Daelemans, W., Verhoeven, B., Stein, B., Potthast, M., Juola, P., Sánchez-Pérez, M.A., Barrón-Cedeño, A.: Overview of the author identification task at PAN 2014. In: Working Notes Papers of the CLEF 2014 Evaluation Labs. CEUR-WS.org, vol. 1180 (2014)Stamatatos, E., Fakotakis, N., Kokkinakis, G.: Automatic text categorization in terms of genre and author. Comput. Linguist. 26(4), 471–495 (2000)Stein, B., Lipka, N., Prettenhofer, P.: Intrinsic plagiarism analysis. Lang. Resour. Eval. (LRE) 45, 63–82 (2011)Stein, B., Meyer zu Eißen, S.: Near Similarity Search and Plagiarism Analysis. In: Proceedings of GFKL 05. Springer, Heidelberg, pp. 430–437 (2006)Verhoeven, B., Daelemans, W.: Clips stylometry investigation (csi) corpus: a dutch corpus for the detection of age, gender, personality, sentiment and deception in text. In: Proceedings of LREC 2014 (2014)Verhoeven, B., Daelemans, W.: CLiPS stylometry investigation (CSI) corpus: a dutch corpus for the detection of age, gender, personality, sentiment and deception in text. In: Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC (2014)Weren, E., Kauer, A., Mizusaki, L., Moreira, V., de Oliveira, P., Wives, L.: Examining multiple features for author profiling. J. Inf. Data Manage. 5(3), 266–280 (2014)Zhang, C., Zhang, P.: Predicting Gender from Blog Posts. Technical Report. University of Massachusetts Amherst, USA (2010

    Design and baseline characteristics of the finerenone in reducing cardiovascular mortality and morbidity in diabetic kidney disease trial

    Get PDF
    Background: Among people with diabetes, those with kidney disease have exceptionally high rates of cardiovascular (CV) morbidity and mortality and progression of their underlying kidney disease. Finerenone is a novel, nonsteroidal, selective mineralocorticoid receptor antagonist that has shown to reduce albuminuria in type 2 diabetes (T2D) patients with chronic kidney disease (CKD) while revealing only a low risk of hyperkalemia. However, the effect of finerenone on CV and renal outcomes has not yet been investigated in long-term trials. Patients and Methods: The Finerenone in Reducing CV Mortality and Morbidity in Diabetic Kidney Disease (FIGARO-DKD) trial aims to assess the efficacy and safety of finerenone compared to placebo at reducing clinically important CV and renal outcomes in T2D patients with CKD. FIGARO-DKD is a randomized, double-blind, placebo-controlled, parallel-group, event-driven trial running in 47 countries with an expected duration of approximately 6 years. FIGARO-DKD randomized 7,437 patients with an estimated glomerular filtration rate >= 25 mL/min/1.73 m(2) and albuminuria (urinary albumin-to-creatinine ratio >= 30 to <= 5,000 mg/g). The study has at least 90% power to detect a 20% reduction in the risk of the primary outcome (overall two-sided significance level alpha = 0.05), the composite of time to first occurrence of CV death, nonfatal myocardial infarction, nonfatal stroke, or hospitalization for heart failure. Conclusions: FIGARO-DKD will determine whether an optimally treated cohort of T2D patients with CKD at high risk of CV and renal events will experience cardiorenal benefits with the addition of finerenone to their treatment regimen. Trial Registration: EudraCT number: 2015-000950-39; ClinicalTrials.gov identifier: NCT02545049

    Taking the pulse of Earth's tropical forests using networks of highly distributed plots

    Get PDF
    Tropical forests are the most diverse and productive ecosystems on Earth. While better understanding of these forests is critical for our collective future, until quite recently efforts to measure and monitor them have been largely disconnected. Networking is essential to discover the answers to questions that transcend borders and the horizons of funding agencies. Here we show how a global community is responding to the challenges of tropical ecosystem research with diverse teams measuring forests tree-by-tree in thousands of long-term plots. We review the major scientific discoveries of this work and show how this process is changing tropical forest science. Our core approach involves linking long-term grassroots initiatives with standardized protocols and data management to generate robust scaled-up results. By connecting tropical researchers and elevating their status, our Social Research Network model recognises the key role of the data originator in scientific discovery. Conceived in 1999 with RAINFOR (South America), our permanent plot networks have been adapted to Africa (AfriTRON) and Southeast Asia (T-FORCES) and widely emulated worldwide. Now these multiple initiatives are integrated via ForestPlots.net cyber-infrastructure, linking colleagues from 54 countries across 24 plot networks. Collectively these are transforming understanding of tropical forests and their biospheric role. Together we have discovered how, where and why forest carbon and biodiversity are responding to climate change, and how they feedback on it. This long-term pan-tropical collaboration has revealed a large long-term carbon sink and its trends, as well as making clear which drivers are most important, which forest processes are affected, where they are changing, what the lags are, and the likely future responses of tropical forests as the climate continues to change. By leveraging a remarkably old technology, plot networks are sparking a very modern revolution in tropical forest science. In the future, humanity can benefit greatly by nurturing the grassroots communities now collectively capable of generating unique, long-term understanding of Earth's most precious forests. Resumen Los bosques tropicales son los ecosistemas más diversos y productivos del mundo y entender su funcionamiento es crítico para nuestro futuro colectivo. Sin embargo, hasta hace muy poco, los esfuerzos para medirlos y monitorearlos han estado muy desconectados. El trabajo en redes es esencial para descubrir las respuestas a preguntas que trascienden las fronteras y los plazos de las agencias de financiamiento. Aquí mostramos cómo una comunidad global está respondiendo a los desafíos de la investigación en ecosistemas tropicales a través de diversos equipos realizando mediciones árbol por árbol en miles de parcelas permanentes de largo plazo. Revisamos los descubrimientos más importantes de este trabajo y discutimos cómo este proceso está cambiando la ciencia relacionada a los bosques tropicales. El enfoque central de nuestro esfuerzo implica la conexión de iniciativas locales de largo plazo con protocolos estandarizados y manejo de datos para producir resultados que se puedan trasladar a múltiples escalas. Conectando investigadores tropicales, elevando su posición y estatus, nuestro modelo de Red Social de Investigación reconoce el rol fundamental que tienen, para el descubrimiento científico, quienes generan o producen los datos. Concebida en 1999 con RAINFOR (Suramérica), nuestras redes de parcelas permanentes han sido adaptadas en África (AfriTRON) y el sureste asiático (T-FORCES) y ampliamente replicadas en el mundo. Actualmente todas estas iniciativas están integradas a través de la ciber-infraestructura de ForestPlots.net, conectando colegas de 54 países en 24 redes diferentes de parcelas. Colectivamente, estas redes están transformando nuestro conocimiento sobre los bosques tropicales y el rol de éstos en la biósfera. Juntos hemos descubierto cómo, dónde y porqué el carbono y la biodiversidad de los bosques tropicales está respondiendo al cambio climático y cómo se retroalimentan. Esta colaboración pan-tropical de largo plazo ha expuesto un gran sumidero de carbono y sus tendencias, mostrando claramente cuáles son los factores más importantes, qué procesos se ven afectados, dónde ocurren los cambios, los tiempos de reacción y las probables respuestas futuras mientras el clima continúa cambiando. Apalancando lo que realmente es una tecnología antigua, las redes de parcelas están generando una verdadera y moderna revolución en la ciencia tropical. En el futuro, la humanidad puede beneficiarse enormemente si se nutren y cultivan comunidades de investigadores de base, actualmente con la capacidad de generar información única y de largo plazo para entender los que probablemente son los bosques más preciados de la tierra. Resumo Florestas tropicais são os ecossistemas mais diversos e produtivos da Terra. Embora uma boa compreensão destas florestas seja crucial para o nosso futuro coletivo, até muito recentemente os esforços de medições e monitoramento foram amplamente desconexos. É essencial formarmos redes para obtermos respostas que transcendem fronteiras e horizontes de agências financiadoras. Neste estudo nós mostramos como uma comunidade global está respondendo aos desafios da pesquisa de ecossistemas tropicais, com equipes diversas medindo florestas, árvore por árvore, em milhares de parcelas monitoradas à longo prazo. Nós revisamos as maiores descobertas científicas deste trabalho, e mostramos também como este processo está mudando a ciência de florestas tropicais. Nossa abordagem principal envolve unir iniciativas de base a protocolos padronizados e gerenciamento de dados a fim de gerar resultados robustos em escalas ampliadas. Ao conectar pesquisadores tropicais e elevar seus status, nosso modelo de Rede de Pesquisa Social reconhece o papel-chave do produtor dos dados na descoberta científica. Concebida em 1999 com o RAINFOR (América do Sul), nossa rede de parcelas permanentes foi adaptada para África (AfriTRON) e Sudeste asiático (T-FORCES), e tem sido extensamente reproduzida em todo o mundo. Agora estas múltiplas iniciativas estão integradas através de uma infraestrutura cibernética do ForestPlots.net, conectando colegas de 54 países de 24 redes de parcelas. Estas iniciativas estão transformando coletivamente o entendimento das florestas tropicais e seus papéis na biosfera. Juntos nós descobrimos como, onde e por que o carbono e a biodiversidade da floresta estão respondendo às mudanças climáticas, e seus efeitos de retroalimentação. Esta duradoura colaboração pantropical revelou um grande sumidouro de carbono persistente e suas tendências, assim como tem evidenciado quais direcionadores são mais importantes, quais processos florestais são mais afetados, onde eles estão mudando, seus atrasos no tempo de resposta, e as prováveis respostas das florestas tropicais conforme o clima continua a mudar. Dessa forma, aproveitando uma notável tecnologia antiga, redes de parcelas acendem faíscas de uma moderna revolução na ciência das florestas tropicais. No futuro a humanidade pode se beneficiar incentivando estas comunidades basais que agora são coletivamente capazes de gerar conhecimentos únicos e duradouros sobre as florestas mais preciosas da Terra. Résume Les forêts tropicales sont les écosystèmes les plus diversifiés et les plus productifs de la planète. Si une meilleure compréhension de ces forêts est essentielle pour notre avenir collectif, jusqu'à tout récemment, les efforts déployés pour les mesurer et les surveiller ont été largement déconnectés. La mise en réseau est essentielle pour découvrir les réponses à des questions qui dépassent les frontières et les horizons des organismes de financement. Nous montrons ici comment une communauté mondiale relève les défis de la recherche sur les écosystèmes tropicaux avec diverses équipes qui mesurent les forêts arbre après arbre dans de milliers de parcelles permanentes. Nous passons en revue les principales découvertes scientifiques de ces travaux et montrons comment ce processus modifie la science des forêts tropicales. Notre approche principale consiste à relier les initiatives de base à long terme à des protocoles standardisés et une gestion de données afin de générer des résultats solides à grande échelle. En reliant les chercheurs tropicaux et en élevant leur statut, notre modèle de réseau de recherche sociale reconnaît le rôle clé de l'auteur des données dans la découverte scientifique. Conçus en 1999 avec RAINFOR (Amérique du Sud), nos réseaux de parcelles permanentes ont été adaptés à l'Afrique (AfriTRON) et à l'Asie du Sud-Est (T-FORCES) et largement imités dans le monde entier. Ces multiples initiatives sont désormais intégrées via l'infrastructure ForestPlots.net, qui relie des collègues de 54 pays à travers 24 réseaux de parcelles. Ensemble, elles transforment la compréhension des forêts tropicales et de leur rôle biosphérique. Ensemble, nous avons découvert comment, où et pourquoi le carbone forestier et la biodiversité réagissent au changement climatique, et comment ils y réagissent. Cette collaboration pan-tropicale à long terme a révélé un important puits de carbone à long terme et ses tendances, tout en mettant en évidence les facteurs les plus importants, les processus forestiers qui sont affectés, les endroits où ils changent, les décalages et les réactions futures probables des forêts tropicales à mesure que le climat continue de changer. En tirant parti d'une technologie remarquablement ancienne, les réseaux de parcelles déclenchent une révolution très moderne dans la science des forêts tropicales. À l'avenir, l'humanité pourra grandement bénéficier du soutien des communautés de base qui sont maintenant collectivement capables de générer une compréhension unique et à long terme des forêts les plus précieuses de la Terre. Abstrak Hutan tropika adalah di antara ekosistem yang paling produktif dan mempunyai kepelbagaian biodiversiti yang tinggi di seluruh dunia. Walaupun pemahaman mengenai hutan tropika amat penting untuk masa depan kita, usaha-usaha untuk mengkaji dan mengawas hutah-hutan tersebut baru sekarang menjadi lebih diperhubungkan. Perangkaian adalah sangat penting untuk mencari jawapan kepada soalan-soalan yang menjangkaui sempadan dan batasan agensi pendanaan. Di sini kami menunjukkan bagaimana sebuah komuniti global bertindak balas terhadap cabaran penyelidikan ekosistem tropika melalui penglibatan pelbagai kumpulan yang mengukur hutan secara pokok demi pokok dalam beribu-ribu plot jangka panjang. Kami meninjau semula penemuan saintifik utama daripada kerja ini dan menunjukkan bagaimana proses ini sedang mengubah bidang sains hutan tropika. Teras pendekatan kami memberi tumpuan terhadap penghubungan inisiatif akar umbi jangka panjang dengan protokol standar serta pengurusan data untuk mendapatkan hasil skala besar yang kukuh. Dengan menghubungkan penyelidik-penyelidik tropika dan meningkatkan status mereka, model Rangkaian Penyelidikan Sosial kami mengiktiraf kepentingan peranan pengasas data dalam penemuan saintifik. Bermula dengan pengasasan RAINFOR (Amerika Selatan) pada tahun 1999, rangkaian-rangkaian plot kekal kami kemudian disesuaikan untuk Afrika (AfriTRON) dan Asia Tenggara (T-FORCES) dan selanjutnya telah banyak dicontohi di seluruh dunia. Kini, inisiatif-inisiatif tersebut disepadukan melalui infrastruktur siber ForestPlots.net yang menghubungkan rakan sekerja dari 54 negara di 24 buah rangkaian plot. Secara kolektif, rangkaian ini sedang mengubah pemahaman tentang hutan tropika dan peranannya dalam biosfera. Kami telah bekerjasama untuk menemukan bagaimana, di mana dan mengapa karbon serta biodiversiti hutan bertindak balas terhadap perubahan iklim dan juga bagaimana mereka saling bermaklum balas. Kolaborasi pan-tropika jangka panjang ini telah mendedahkan sebuah sinki karbon jangka panjang serta arah alirannya dan juga menjelaskan pemandu-pemandu perubahan yang terpenting, di mana dan bagaimana proses hutan terjejas, masa susul yang ada dan kemungkinan tindakbalas hutan tropika pada perubahan iklim secara berterusan di masa depan. Dengan memanfaatkan pendekatan lama, rangkaian plot sedang menyalakan revolusi yang amat moden dalam sains hutan tropika. Pada masa akan datang, manusia sejagat akan banyak mendapat manfaat jika memupuk komuniti-komuniti akar umbi yang kini berkemampuan secara kolektif menghasilkan pemahaman unik dan jangka panjang mengenai hutan-hutan yang paling berharga di dunia
    corecore